XTTS
π§ XTTS in SkyrimNet β the Default-Quality TTSβ
XTTS (Cross-lingual Text-to-Speech) is a powerful, deep-learning-based TTS engine that brings realistic, emotionally expressive, and cloneable voices to Skyrim. Unlike simpler TTS engines, XTTS can replicate a specific voice from a short audio clip, making it ideal for immersive, character-specific dialogue in modded Skyrim.
In SkyrimNet, XTTS is used via a local HTTP endpoint, making it easy to integrate and fast enough for real-time use.
Itβs currently considered the default voice generation system in SkyrimNet, especially for voice cloning and good emotional fidelity.
ποΈ What XTTS Doesβ
XTTS converts any input text into high-quality, expressive speech β optionally mimicking a specific voice using a voice reference sample.
Input:
Text:"You're not from around here, are you?"
Voice sample: 30-second clip of a female Nord NPCOutput:
High-fidelity audio of that line, spoken in the same voice and tone as the sample
XTTS produces rich, natural speech, with subtle pauses, intonation, and personality β perfect for Skyrimβs varied characters.
π How XTTS Works in SkyrimNetβ
XTTS is not currently embedded into SkyrimNet like Piper β instead, it runs as a separate local TTS service, typically on:
Hereβs how SkyrimNet uses it:
-
SkyrimNet sends a request to the XTTS server with:
- The text to speak
- Optional voice reference audio
- Optional speaker ID or emotion hints
-
XTTS returns a fully rendered WAV or PCM audio clip
-
SkyrimNet plays the audio in-game, synced with dialogue
This architecture keeps SkyrimNet lightweight while still offering powerful voice features via XTTS.
𧬠Key Features of XTTS in SkyrimNetβ
- π Voice Cloning: Easily assign unique voices to NPCs using short reference clips
- π Cross-lingual Support: Speak English in a French, Argonian, or Dunmer accent
- π§ Emotion Control (planned): Adjust mood and tone of delivery for immersive reactions
- β»οΈ Reusable Voices: Store and reuse custom voices for followers, companions, or even the player
π¦ XTTS vs Piperβ
Feature | Piper (In-Process) | XTTS (External API) |
---|---|---|
Speed | β‘ Very fast | β οΈ Slower (1β2s latency) |
Voice Quality | β Good | β β Excellent |
Voice Cloning | β Not supported | β Full support |
Integration | β Native DLL | π HTTP endpoint |
π Why XTTS is SkyrimNet's Default Quality TTSβ
-
π§ Offers the good audio realism
Natural cadence, clear articulation, and emotional depth β ideal for immersive dialogue. -
π Supports voice reuse and identity
Easily assign consistent voices to NPCs using short reference samples. -
π§ Enables AI-driven dialogue to feel grounded and believable
Dynamic lines generated by LLMs sound intentional, like a real voice actor spoke them. -
π¬ Works with any line β by input or LLM-generated β and makes it sound intentional
Perfect for branching narratives, roleplay mods, and reactive NPC behavior.
π£οΈ Setting Up XTTS Mantella API Server for SkyrimNet
Follow these steps to set up XTTS as your TTS backend:
π¦ Step 1: Download and Extractβ
-
Download the XTTS Mantella API Server from its Nexus Mods page.
-
Unzip it to a folder of your choice (avoid system folders like
C:\Program Files
). -
Download the latent speaker folder for the language(s) you plan to use (also on the same Nexus page).
-
Extract the speaker folder into the same directory as the server.
βΆοΈ Step 2: Start the XTTS Serverβ
-
Launch
xtts-api-server-mantella.exe
inside the extracted folder. -
On first launch, it will prompt you to confirm several settings. You can press
Enter
to accept defaults.
Recommended Settings:β
-
Device:
- Use
cuda
if you have an NVIDIA GPU - Use
cpu
otherwise
- Use
-
Deepspeed:
- Set to
yes
only if you have an NVIDIA GPU that supports it (check Nexus description for compatible cards)
- Set to
βοΈ Step 3: Configure SkyrimNetβ
In the SkyrimNet Web UI:
- Go to
Test and Easy Setup
- Under Text-to-Speech, set:
- TTS Backend β
XTTS
- TTS Server URL β
http://localhost:8020
(or your XTTS server's IP address if running on a separate machine)
- TTS Backend β
You're now ready to generate voices using XTTS! β
π For Mantella XTTS Users: Fast & Easy Way to Make a Custom Voice Latent
Want your custom NPC to use a unique voiceβor fix a vanilla one that doesnβt quite fit? Hereβs how to create a high-quality voice latent (custom voice model) using just a .wav
file.
β Step-by-Step Guideβ
π§± Step 1: (Optional) Get a Clean Voice Sampleβ
If you already have a clean
.wav
sample, skip to Step 4.
- Download LazyVoiceFinder.
- Read the mod description carefully to install requirements.
- Extract the tool outside your game or Windows folders.
- Download the Patch from the "Update Files" section and overwrite the original files.
π§ Step 2: Extract a Voice Lineβ
- Launch
LazyVoiceFinder.exe
. - Select
Skyrim
from the Game Mode dropdown. - Click
File
βOpen
(for vanilla/DLC voices)
orFile
βOpen from file
(for modded voices). - Use the filters to find voice lines by:
- Plugin
- Voice type
- Dialogue content
Example:β
- Adrianne Avenicci uses
FemaleCommander
, but you want a version that better reflects her subtle Imperial accent. - Use keywords like
"I don't claim to be the best blacksmith..."
inDialogue 1
. - Click the green play button to preview.
- Right-click the best-sounding line β
Copy voice file as WAV Format
. - Paste the
.wav
into yourXTTS\speakers\en
folder.
π§Ή Step 3: (Optional but Recommended) Clean & Convert the Audioβ
A clean sample = a better latent!
Tips for a good sample:
- Clear voice only, no background sounds or music.
- Natural flow (no long pauses or clipped audio).
- Length: 7β10 seconds is ideal.
- Format: Mono, 22050Hz, 16-bit WAV
π§ Use Audacity:β
- Launch Audacity.
File
βImport
βAudio
β select your.wav
.Tracks
βResample
β enter22050 Hz
.Tracks
βMix
βMix Stereo Down to Mono
(if needed).File
βExport Audio
β Save as.wav
(Mono, 22050 Hz, 16-bit).
π οΈ Step 4: Generate the Voice Latentβ
- Move your finalized
.wav
into theXTTS\speakers\en
folder. - Rename it (e.g.,
adrianne.wav
). - Run
xtts-api-server-mantella.exe
.- It will automatically generate a
.json
voice latent inXTTS\latent_speaker_folder\en
.
- It will automatically generate a
Step 5: Assign the Voice to an NPCβ
- Launch Skyrim and get near the NPC.
- Open SkyrimNet Web UI.
- Navigate to:
Advanced Configuration β Character Overrides β Nearby β NPC name β Entity β Voice ID
yaml Copiar Editar
- Set the Voice ID to your new voice (e.g.,
adrianne
).
π Done!β
Your NPC now speaks with their custom voice! Enjoy the immersion.
β οΈ Note:
DO NOT share any.wav
or.json
latent files unless you own the voice or have clear permission to redistribute.